Differentially Private Ordinary Least Squares: $t$-Values, Confidence Intervals and Rejecting Null-Hypotheses

نویسنده

  • Or Sheffet
چکیده

Linear regression is one of the most prevalent techniques in data analysis. Given a large collection of samples composed of features x and a label y, linear regression is used to find the best prediction of the label as a linear combination of the features. However, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS uses linear regression in order to estimate the correlation between the label and a feature xj on a given dataset. Then, under the assumption of a certain random generative model for the data, OLS derives t-values — representing the likelihood of each real value to be the true correlation in the underlying distribution. Using t-values, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation. When this interval does not intersect the origin, we can reject the null hypothesis as it is likely that xj indeed has a non-zero correlation with y. Our work aims at achieving similar guarantees on data under differentially private estimators. We use the Gaussian Johnson-Lindenstrauss transform, which has been shown to satisfy differential privacy if the given data has large singular values [BBDS12]. We analyze the result of projecting the data using the JL transform under the OLS model and show how to derive approximated t-values. Using the approximated t-values we give confidence intervals and bound the number of samples needed to reject the null hypothesis with differential privacy, when the data is drawn i.i.d from a multivariate Gaussian. When not all singular values of the data are sufficiently large, we alter the input and increase its singular values and then project it using a JL transform. Thus our projected data yields an approximation for the Ridge Regression problem — a variant of the linear regression that uses a l2-regularization term. We derive, under certain conditions, confidence intervals using the projected Ridge regression. We also derive, under different conditions, confidence intervals for the “Analyze Gauss” algorithm of Dwork et al [DTTZ14].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differentially Private Ordinary Least Squares

Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features...

متن کامل

Differentially Private Ordinary Least Squares

More specifically, we use Theorem B.1 from (Sheffet, 2015) that states that given a matrix A whose all of its singular values at greater than T ( , δ) where T ( , δ) = 2B (√ 2r ln(4/δ) + 2 ln(4/δ) ) , publishing RA is ( , δ)differentially private for a r-row matrix R whose entries sampled are i.i.d normal Gaussians. Since we have that all of the singular values of A′ are greater than w (as spec...

متن کامل

Standard Errors and Confidence Intervals in Inverse Problems: Sensitivity and Associated Pitfalls

We review the asymptotic theory for standard errors in classical ordinary least squares (OLS) inverse or parameter estimation problems involving general nonlinear dynamical systems where sensitivity matrices can be used to compute the asymptotic covariance matrices. We discuss possible pitfalls in computing standard errors in regions of low parameter sensitivity and/or near a steady state solut...

متن کامل

Life-history invariants with bounded variables cannot be distinguish from data generated by random processes using standard analyses.

A dimensionless approach to the study of life-history evolution has been applied to a wide variety of variables in the search for life-history invariants. This approach usually employs ordinary least squares (OLS) regressions of log-transformed data. In several well-studied combinations of variables the range of values of one parameter is bounded or limited by the value of the other. In this si...

متن کامل

Enhancing tidal harmonic analysis: Robust (hybrid L=L) solutions

Traditional harmonic analysis of tides is highly sensitive to omnipresent environmental noise. Robust fitting is an extension of the ordinary least squares calculation of harmonic analysis that is more resistant to broad spectrum noise. Since the variance of the amplitude and phase is calculated from the power spectrum of the residual, a calculation that filters broad spectrum noise and reduces...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015